[Bug] QWEN3 VL 30B Multi GPU Train

Did you update? pip install --upgrade unsloth unsloth_zoo Yes
Colab or Kaggle or local / cloud Local
Number GPUs used, use nvidia-smi 2xL40s
Which notebook? Please link! https://pastebin.com/ZG0vf0P9
Which Unsloth version, TRL version, transformers version, PyTorch version? 2026.2.1 0.24.0 4.57.6 2.10.0+cu126
Which trainer? SFTTrainer


Environment;
OS: Ubuntu
Python: 3.11.14
Trainer: SFTTrainer
Model: FastVisionModel
GPU: 2x L40s
Conda Env

Issues:
I'm trying to finetune Qwen/Qwen3-VL-30B-A3B-Instruct with 2x L40s using Multi GPU Sharding thru 
device_map = "balanced"
 model, tokenizer = FastVisionModel.from_pretrained(
        model_name,
        load_in_4bit=load_in_4bit,
        full_finetuning = False,
        use_gradient_checkpointing="unsloth",
        device_map = "balanced",
        
    )
full script: https://pastebin.com/ZG0vf0P9

but i got error 
```
torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_function <built-in function add>(*(FakeTensor(..., device='cuda:1', size=(), dtype=torch.int32), FakeTensor(..., device='cuda:0', size=(), dtype=torch.int64)), **{}): got RuntimeError('Unhandled FakeTensor Device Propagation for aten.add.Tensor, found two different devices cuda:1, cuda:0')

from user code:
   File "/home/pitai/.conda/envs/ft-io/lib/python3.11/site-packages/torch/nn/attention/flex_attention.py", line 1645, in _flex_attention_hop_wrapper
    return flex_attention_hop(*args, **kwargs)
  File "/home/pitai/.conda/envs/ft-io/lib/python3.11/site-packages/transformers/masking_utils.py", line 165, in inner_mask
    return mask_function(batch_idx, head_idx, q_idx + q_offset, kv_idx + kv_offset)

Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"
```





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] QWEN3 VL 30B Multi GPU Train #4066

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug] QWEN3 VL 30B Multi GPU Train #4066

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions