Fix HF -> Torchtitan Expert Conversion Sorting Bug (#1918)

jthomy · web-flow · commit fb549717a783 · 2025-10-17T17:54:32.000-07:00
The expert_num is a string, which causes `sorted_expert_ids =
sorted(experts.keys())` to not sort correctly for Deepseek and Qwen3
(sorts lexicographically).
This means, that converting from huggingface currently results in
wrongly ordered experts. Roundtripping a state dict with more than 10
experts catches this bug.
Fix: Cast to int, as the type signature was intended.
diff --git a/torchtitan/models/deepseek_v3/model/state_dict_adapter.py b/torchtitan/models/deepseek_v3/model/state_dict_adapter.py
@@ -171,7 +171,7 @@ def from_hf(self, hf_state_dict: dict[str, Any]) -> dict[str, Any]:
                 if titan_abstract_key not in expert_weights_by_layer[layer_num]:
                     expert_weights_by_layer[layer_num][titan_abstract_key] = {}
                 expert_weights_by_layer[layer_num][titan_abstract_key][
-                    expert_num
+                    int(expert_num)
                 ] = value
 
                 if isinstance(value, DTensor):
diff --git a/torchtitan/models/qwen3/model/state_dict_adapter.py b/torchtitan/models/qwen3/model/state_dict_adapter.py
@@ -131,7 +131,7 @@ def from_hf(self, hf_state_dict: dict[str, Any]) -> dict[str, Any]:
                 if titan_abstract_key not in expert_weights_by_layer[layer_num]:
                     expert_weights_by_layer[layer_num][titan_abstract_key] = {}
                 expert_weights_by_layer[layer_num][titan_abstract_key][
-                    expert_num
+                    int(expert_num)
                 ] = value
 
                 if isinstance(value, DTensor):