Skip to content

Commit 084d307

Browse files
authored
[doc] update DeepSeekV3ModelArgs doc string (#1598)
In this PR, I'm updated the outdated doc string for DeepSeekV3ModelArgs
1 parent b5b7ffb commit 084d307

File tree

1 file changed

+4
-7
lines changed
  • torchtitan/models/deepseek_v3/model

1 file changed

+4
-7
lines changed

torchtitan/models/deepseek_v3/model/args.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -34,20 +34,17 @@ class DeepSeekV3ModelArgs(BaseModelArgs):
3434
n_layers (int): Number of transformer layers.
3535
n_dense_layers (int): Number of dense layers in the model.
3636
n_heads (int): Number of attention heads.
37-
n_routed_experts (int): Number of routed experts for MoE layers.
38-
n_shared_experts (int): Number of shared experts for MoE layers.
39-
n_activated_experts (int): Number of activated experts in MoE layers.
37+
norm_eps (float): Epsilon value used for RMSNorm.
38+
moe_args (MoEArgs): MoE configuration.
4039
n_expert_groups (int): Number of expert groups.
4140
n_limited_groups (int): Number of limited groups for MoE routing.
42-
score_func (Literal["softmax", "sigmoid"]): Scoring function for MoE routing.
43-
route_scale (float): Scaling factor for routing scores.
44-
use_grouped_mm (bool): Whether to use grouped matrix multiplication for MoE layers.
45-
load_balance_coeff (float | None): Auxiliary-Loss-Free Load balancing coefficient for MoE layers.
4641
q_lora_rank (int): LoRA rank for query projections.
4742
kv_lora_rank (int): LoRA rank for key-value projections.
4843
qk_nope_head_dim (int): Dimension for query-key projections without positional embeddings.
4944
qk_rope_head_dim (int): Dimension for query-key projections with rotary embeddings.
5045
v_head_dim (int): Dimension for value projections.
46+
use_flex_attn (bool): Whether to use FlexAttention.
47+
attn_mask_type (str): Type of attention mask.
5148
original_seq_len (int): Original sequence length.
5249
rope_theta (float): Base for rotary positional encoding.
5350
rope_factor (float): Scaling factor for extended sequence lengths.

0 commit comments

Comments
 (0)