【Inference Optimize】optimize DeepSeek_v3 #3349

chang-wenbin · 2025-08-12T07:17:12Z

Eliminate redundant calculations

      fmha_out = None
        # NOTE: (changwenbin) Bring out the public calculation in PD MIX to avoid repeated calculation.
        query = self.q_a_proj(hidden_states)
        query = self.q_a_layernorm(query)
        query = self.q_b_proj(query)
        query = query.reshape([-1, self.num_attention_heads_tp, self.qk_head_dim])
        query_nope, query_pe = query.split([self.qk_nope_head_dim, self.qk_rope_head_dim], axis=-1)
        compressed_kv = self.kv_a_proj_with_mqa(hidden_states)
        compressed_kv, key_pe = compressed_kv.split([self.kv_lora_rank, self.qk_rope_head_dim], axis=-1)
        key_pe = key_pe.reshape([-1, 1, self.qk_rope_head_dim])
        compressed_kv = self.kv_a_layernorm(compressed_kv)
        query_pe, key_pe = self.rotary_emb(position_ids, query_pe, key_pe)

encoder using FA3
you need export FLAGS_flash_attn_version=3

… FA3

paddle-bot · 2025-08-12T07:19:13Z

Thanks for your contribution!

optimize DeepSeek_v3 Eliminate redundant calculations & encoder using…

fde43aa

… FA3

Merge branch 'develop' into DSK_OPT1

ba92551

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【Inference Optimize】optimize DeepSeek_v3 #3349

【Inference Optimize】optimize DeepSeek_v3 #3349

Uh oh!

chang-wenbin commented Aug 12, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Aug 12, 2025

Uh oh!

Uh oh!

【Inference Optimize】optimize DeepSeek_v3 #3349

Are you sure you want to change the base?

【Inference Optimize】optimize DeepSeek_v3 #3349

Uh oh!

Conversation

chang-wenbin commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paddle-bot bot commented Aug 12, 2025

Uh oh!

Uh oh!

chang-wenbin commented Aug 12, 2025 •

edited

Loading