-
Notifications
You must be signed in to change notification settings - Fork 208
Open
Labels
Description
Describe the bug
When fine-tuning gpt-oss using dtensor worker v2 and tensor parallel, some tp plan is not recognized by fsdp, for example, attention sink use local_rowwise, moe use gather or grouped_gemm. After adding support for those plans in translate function, it still raise type error when callling parallelize_module since moe parallel strategy imported from transformers lib has different type to ParallelStyle.
Steps/Code to reproduce bug
python examples/run_sft.py cluster.gpus_per_node=4 policy.model_name=openai/gpt-oss-20b policy.dtensor_cfg.tensor_parallel_size=4Expected behavior
A clear and concise description of what you expected to happen.
Additional context
Add any other context about the problem here.
coderabbitai