Skip to content

gpt-oss dtensor worker v2 does not support tensor parallelΒ #1684

@jordane95

Description

@jordane95

Describe the bug

When fine-tuning gpt-oss using dtensor worker v2 and tensor parallel, some tp plan is not recognized by fsdp, for example, attention sink use local_rowwise, moe use gather or grouped_gemm. After adding support for those plans in translate function, it still raise type error when callling parallelize_module since moe parallel strategy imported from transformers lib has different type to ParallelStyle.

Steps/Code to reproduce bug

python examples/run_sft.py cluster.gpus_per_node=4 policy.model_name=openai/gpt-oss-20b policy.dtensor_cfg.tensor_parallel_size=4

Expected behavior

A clear and concise description of what you expected to happen.

Additional context

Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions