Skip to content

Commit 030879f

Browse files
authored
[Qwen3] Fix weight tying for Qwen3 according to Huggingface configs (#1633)
As titled. Only enable weight tying for smaller model
1 parent 9197908 commit 030879f

File tree

1 file changed

+3
-0
lines changed

1 file changed

+3
-0
lines changed

torchtitan/experiments/qwen3/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
qk_norm=True,
4141
hidden_dim=3072,
4242
rope_theta=1000000,
43+
enable_weight_tying=True,
4344
),
4445
"1.7B": Qwen3ModelArgs(
4546
vocab_size=151936,
@@ -52,6 +53,7 @@
5253
qk_norm=True,
5354
hidden_dim=6144,
5455
rope_theta=1000000,
56+
enable_weight_tying=True,
5557
),
5658
"4B": Qwen3ModelArgs(
5759
vocab_size=151936,
@@ -64,6 +66,7 @@
6466
qk_norm=True,
6567
hidden_dim=9728,
6668
rope_theta=1000000,
69+
enable_weight_tying=True,
6770
),
6871
"8B": Qwen3ModelArgs(
6972
vocab_size=151936,

0 commit comments

Comments
 (0)