Skip to content

Commit fb54971

Browse files
authored
Fix HF -> Torchtitan Expert Conversion Sorting Bug (#1918)
The expert_num is a string, which causes `sorted_expert_ids = sorted(experts.keys())` to not sort correctly for Deepseek and Qwen3 (sorts lexicographically). This means, that converting from huggingface currently results in wrongly ordered experts. Roundtripping a state dict with more than 10 experts catches this bug. Fix: Cast to int, as the type signature was intended.
1 parent b206439 commit fb54971

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

torchtitan/models/deepseek_v3/model/state_dict_adapter.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ def from_hf(self, hf_state_dict: dict[str, Any]) -> dict[str, Any]:
171171
if titan_abstract_key not in expert_weights_by_layer[layer_num]:
172172
expert_weights_by_layer[layer_num][titan_abstract_key] = {}
173173
expert_weights_by_layer[layer_num][titan_abstract_key][
174-
expert_num
174+
int(expert_num)
175175
] = value
176176

177177
if isinstance(value, DTensor):

torchtitan/models/qwen3/model/state_dict_adapter.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -131,7 +131,7 @@ def from_hf(self, hf_state_dict: dict[str, Any]) -> dict[str, Any]:
131131
if titan_abstract_key not in expert_weights_by_layer[layer_num]:
132132
expert_weights_by_layer[layer_num][titan_abstract_key] = {}
133133
expert_weights_by_layer[layer_num][titan_abstract_key][
134-
expert_num
134+
int(expert_num)
135135
] = value
136136

137137
if isinstance(value, DTensor):

0 commit comments

Comments
 (0)