Skip to content

Commit 3a15e0c

Browse files
【Fix Bug】 修复 fa3 支持集中式bug (#3235)
* fix fa3 集中式bug * 增加qknorm参数
1 parent afff4d3 commit 3a15e0c

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

fastdeploy/model_executor/layers/attention/flash_attn_backend.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -344,7 +344,7 @@ def forward_mixed(
344344
forward_meta.decoder_batch_ids, # from buffer
345345
forward_meta.decoder_tile_ids_per_batch, # from buffer
346346
forward_meta.decoder_num_blocks_cpu,
347-
forward_meta.max_len_tensor_cpu,
347+
metadata.max_len_tensor_cpu_decoder,
348348
metadata.max_len_kv,
349349
metadata.rotary_embs,
350350
forward_meta.attn_mask,
@@ -359,6 +359,9 @@ def forward_mixed(
359359
layer.linear_shift,
360360
layer.linear_smooth,
361361
metadata.kv_signal_data_list[layer.layer_id],
362+
getattr(layer, "q_norm_weight", None),
363+
getattr(layer, "k_norm_weight", None),
364+
getattr(layer, "rms_norm_eps", 1e-6),
362365
metadata._fuse_kernel_compute_dtype,
363366
getattr(layer, "cache_quant_type_str", "none"),
364367
layer.use_neox_rotary_style,

0 commit comments

Comments
 (0)